Stances in 'Introduction': Info & Library Science - Introduction 1 - Full text
(1) Select an 'Introduction' right arrow (2) Select a move in that 'Introduction' (What is this?)

Title: Multiple Heuristics and Their Combination for Automatic WordNet Mapping
Author(s): CHANGKI LEE, GARY GEUNBAE LEE and JUNGYUN SEO
Journal: Computers and the Humanities?38?(2004).
Move
Introduction 1: Full text

Move 1: Establish A Territory

There is no doubt on the increasing importance of using wide coverage thesauri for NLP tasks, especially for word sense disambiguation, machine translation, and information retrieval. While these thesauri (e.g. Roget’s thesaurus, WordNet (Miller, 1990), etc.) exist in English, there are very few available wide-range thesauri for other languages. Of course, manual construction of a thesaurus by experts is the most reliable technique, yet it is also the most costly and time-consuming. For this reason, many researchers focus on the massive acquisition of lexical knowledge and semantic information from pre-existing lexical resources, preferring an automatic approach.

Move 2: Establish A Niche

This paper presents a novel approach to automatic WordNet mapping, using word sense disambiguation. The method has been successfully applied to link Korean words from a bilingual dictionary to English WordNet synsets. To clarify the description, an example is given in Figure 1. To link the first sense of Korean word ‘gwan-mog’ to a WordNet synset, we employ a bilingual Korean-English dictionary. The first sense of ‘gwan-mog’ has ‘bush’as a translation in English, and ‘bush’ has five synsets in WordNet. Therefore, the first sense of ‘gwan-mog’ has five candidate synsets. So we have to decide a synset among five candidates and (have to) link the first sense of ‘gwan-mog’ to this synset. As seen from this example, when we link the senses of Korean words to WordNet synsets, semantic ambiguities exist. To remove the ambiguities, we develop new word sense disambiguation heuristics to construct a Korean WordNet based on the existing English WordNet. We focus on the mapping of nouns and our mapping target only applies to the WordNet synset which has one or more corresponding Korean word senses among nouns. Some heuristics would not be meaningful for other parts of speech (i.e., adjective, adverb, and verb).

Move 3: Present the Present Work

This paper is organized as follows. In Section 2, we discuss some previous researches on automatic thesaurus acquisition and compare them with our own research. In Section 3, we formally define the problem. In Section 4, we describe multiple heuristics for word sense disambiguation to solve the problem. In Section 5, we explain the method to combine these heuristics to boost each heuristic’s interactive performance. Section 6 presents comprehensive experiment results and analyses for evaluation.Finally, we draw some conclusions and future research in Section 7. The automatic mapping-based Korean WordNet can play the role of a Korean-English bilingual thesaurus, which will be useful for Korean-English cross-lingual information retrieval and Korean-English machine translation.